Goto

Collaborating Authors

 digital assistant


What Do Indonesians Really Need from Language Technology? A Nationwide Survey

Kautsar, Muhammad Dehan Al, Susanto, Lucky, Wijaya, Derry, Koto, Fajri

arXiv.org Artificial Intelligence

There is an emerging effort to develop NLP for Indonesias 700+ local languages, but progress remains costly due to the need for direct engagement with native speakers. However, it is unclear what these language communities truly need from language technology. To address this, we conduct a nationwide survey to assess the actual needs of native speakers in Indonesia. Our findings indicate that addressing language barriers, particularly through machine translation and information retrieval, is the most critical priority. Although there is strong enthusiasm for advancements in language technology, concerns around privacy, bias, and the use of public data for AI training highlight the need for greater transparency and clear communication to support broader AI adoption.


Voice CMS: updating the knowledge base of a digital assistant through conversation

Wolny, Grzegorz, Szczerbak, Michał

arXiv.org Artificial Intelligence

In this study, we propose a solution based on a multi-agent LLM architecture and a voice user interface (VUI) designed to update the knowledge base of a digital assistant. Its usability is evaluated in comparison to a more traditional graphical content management system (CMS), with a focus on understanding the relationship between user preferences and the complexity of the information being provided. The findings demonstrate that, while the overall usability of the VUI is rated lower than the graphical interface, it is already preferred by users for less complex tasks. Furthermore, the quality of content entered through the VUI is comparable to that achieved with the graphical interface, even for highly complex tasks. Obtained qualitative results suggest that a hybrid interface combining the strengths of both approaches could address the key challenges identified during the experiment, such as reducing cognitive load through graphical feedback while maintaining the intuitive nature of voice-based interactions. This work highlights the potential of conversational interfaces as a viable and effective method for knowledge management in specific business contexts.


Alexa is a smarter, more conversational AI version of Amazon's digital assistant

Engadget

Following years of development, Amazon's next-generation digital assistant is ready for public use. The model powering Alexa can detect tone and mood and respond accordingly, with a completely new voice -- one that sounds more natural. Moreover, it's only necessary to say "Alexa" once to wake the assistant. It will then follow the conversation. Panay said Alexa has contextual awareness, with the ability to "remember" earlier parts of a conversation.


Stop talking to your phone: How to use Type to Siri

Popular Science

Among the changes ushered in with iOS 18.1, iPadOS 18.1, and macOS 15.1 Sequoia is a new Type to Siri option. This means you can carry on a conversation with Apple's digital assistant without having to talk out loud, which is helpful when you're in a quiet library, busy subway car, or anywhere else you can't really use voice control. The ability to type to Siri has actually been available on Apple devices for several years now, but previously it was hidden away in the Accessibility settings and not all that easy to find. Now Apple has given it much more prominence in its operating systems, so typing is just as straightforward as talking. Breakthroughs, discoveries, and DIY tips sent every weekday.


GraphAide: Advanced Graph-Assisted Query and Reasoning System

Purohit, Sumit, Chin, George, Mackey, Patrick S, Cottam, Joseph A

arXiv.org Artificial Intelligence

Curating knowledge from multiple siloed sources that contain both structured and unstructured data is a major challenge in many real-world applications. Pattern matching and querying represent fundamental tasks in modern data analytics that leverage this curated knowledge. The development of such applications necessitates overcoming several research challenges, including data extraction, named entity recognition, data modeling, and designing query interfaces. Moreover, the explainability of these functionalities is critical for their broader adoption. The emergence of Large Language Models (LLMs) has accelerated the development lifecycle of new capabilities. Nonetheless, there is an ongoing need for domain-specific tools tailored to user activities. The creation of digital assistants has gained considerable traction in recent years, with LLMs offering a promising avenue to develop such assistants utilizing domain-specific knowledge and assumptions. In this context, we introduce an advanced query and reasoning system, GraphAide, which constructs a knowledge graph (KG) from diverse sources and allows to query and reason over the resulting KG. GraphAide harnesses both the KG and LLMs to rapidly develop domain-specific digital assistants. It integrates design patterns from retrieval augmented generation (RAG) and the semantic web to create an agentic LLM application. GraphAide underscores the potential for streamlined and efficient development of specialized digital assistants, thereby enhancing their applicability across various domains.


Enhancing Trust and Safety in Digital Payments: An LLM-Powered Approach

Dahiphale, Devendra, Madiraju, Naveen, Lin, Justin, Karve, Rutvik, Agrawal, Monu, Modwal, Anant, Balakrishnan, Ramanan, Shah, Shanay, Kaushal, Govind, Mandawat, Priya, Hariramani, Prakash, Merchant, Arif

arXiv.org Artificial Intelligence

Digital payment systems have revolutionized financial transactions, offering unparalleled convenience and accessibility to users worldwide. However, the increasing popularity of these platforms has also attracted malicious actors seeking to exploit their vulnerabilities for financial gain. To address this challenge, robust and adaptable scam detection mechanisms are crucial for maintaining the trust and safety of digital payment ecosystems. This paper presents a comprehensive approach to scam detection, focusing on the Unified Payments Interface (UPI) in India, Google Pay (GPay) as a specific use case. The approach leverages Large Language Models (LLMs) to enhance scam classification accuracy and designs a digital assistant to aid human reviewers in identifying and mitigating fraudulent activities. The results demonstrate the potential of LLMs in augmenting existing machine learning models and improving the efficiency, accuracy, quality, and consistency of scam reviews, ultimately contributing to a safer and more secure digital payment landscape. Our evaluation of the Gemini Ultra model on curated transaction data showed a 93.33% accuracy in scam classification. Furthermore, the model demonstrated 89% accuracy in generating reasoning for these classifications. A promising fact, the model identified 32% new accurate reasons for suspected scams that human reviewers had not included in the review notes.


Meta debuts augmented reality glasses and Judi Dench-voiced AI chatbot

The Guardian

Meta CEO Mark Zuckerberg presented new augmented reality glasses at the company's annual developer conference on Wednesday, debuting a prototype of the next phase in its expansion into smart eyewear. Zuckerberg also announced that Meta AI will be able to talk in the voice of Dame Judi Dench. The glasses, named Orion, have the ability to project digital representations of media, people, games and communications onto the real world. Meta and Zuckerberg have framed the product as a step away from desktop computers and smartphone into eyewear that can perform similar tasks. "A lot of people have said this is the craziest technology they've ever seen," Zuckerberg boasted during his keynote speech, clad in a shirt that read "Aut Zuck aut nihil", Latin for "Either Zuck or nothing", substituting his own name into a motto coined by the Roman emperor Caesar.


WorkR: Occupation Inference for Intelligent Task Assistance

Khaokaew, Yonchanok, Xue, Hao, Rahaman, Mohammad Saiedur, Salim, Flora D.

arXiv.org Artificial Intelligence

Occupation information can be utilized by digital assistants to provide occupation-specific personalized task support, including interruption management, task planning, and recommendations. Prior research in the digital workplace assistant domain requires users to input their occupation information for effective support. However, as many individuals switch between multiple occupations daily, current solutions falter without continuous user input. To address this, this study introduces WorkR, a framework that leverages passive sensing to capture pervasive signals from various task activities, addressing three challenges: the lack of a passive sensing architecture, personalization of occupation characteristics, and discovering latent relationships among occupation variables. We argue that signals from application usage, movements, social interactions, and the environment can inform a user's occupation. WorkR uses a Variational Autoencoder (VAE) to derive latent features for training models to infer occupations. Our experiments with an anonymized, context-rich activity and task log dataset demonstrate that our models can accurately infer occupations with more than 91% accuracy across six ISO occupation categories.


Digital assistant in a point of sales

Lesiak, Emilia, Wolny, Grzegorz, Przybył, Bartosz, Szczerbak, Michał

arXiv.org Artificial Intelligence

This article investigates the deployment of a Voice User Interface (VUI)-powered digital assistant in a retail setting and assesses its impact on customer engagement and service efficiency. The study explores how digital assistants can enhance user interactions through advanced conversational capabilities with multilingual support. By integrating a digital assistant into a high-traffic retail environment, we evaluate its effectiveness in improving the quality of customer service and operational efficiency. Data collected during the experiment demonstrate varied impacts on customer interaction, revealing insights into the future optimizations of digital assistant technologies in customer-facing roles. This study contributes to the understanding of digital transformation strategies within the customer relations domain emphasizing the need for service flexibility and user-centric design in modern retail stores.


Google Project Astra hands-on: Full of potential, but it's going to be a while

Engadget

At I/O 2024, Google's teaser for Project Astra gave us a glimpse at where AI assistants are going in the future. It's a multi-modal feature that combines the smarts of Gemini with the kind of image recognition abilities you get in Google Lens, as well as powerful natural language responses. However, while the promo video was slick, after getting to try it out in person, it's clear there's a long way to go before something like Astra lands on your phone. So here are three takeaways from our first experience with Google's next-gen AI. Currently, most people interact with digital assistants using their voice, so right away Astra's multi-modality (i.e. using sight and sound in addition to text/speech) to communicate with an AI is relatively novel.